Using the keyboard:" This interactive program requires# you to use the keyboard at certain
times. % For example, a message will tell you% when you can press a key in order to! continue to the next step of the# demonstration. When you see this
message, you can press the space bar
or the enter key
to continue. You need not hold the key down.& If you do not touch the keyboard, the& program will proceed automatically at
about
seconds per step." The message at the bottom of this$ screen indicates that you can press the space bar or enter key now.& Sometimes, there will be special keys& that you can press in order to change" the way that information is being' displayed or to change the flow of the
the demonstration.% These special keys will be shown as & the choices on a menu or may be shown" on the bottom line of the screen.' Pressing "D" will change the number of& decimals in the display at the bottom& of the screen right now, for example.
Tapping the key will suffice." You need not press the enter key.
The program is counting
how many times you have
pressed the "D" key.
Press
Space-
Decimals Count =
& Each of the demonstrations includes a% sampling experiment during which the! "P", "Q", and "Space" keys have
special functions.& You can cause the experiment to pause' after each sample or to sample without! pausing by pressing the "P" key, which acts as an on-off switch." Press the "P" key a few times now% and watch the effect on the sampling
below.' Pressing the space bar causes the next% sample to be drawn. Try it while in
Pause mode. Pressing the "Q" key stops the & sampling and advances the program to
the next topic.
Pause
Quit
Pause mode is
in effect.
Sample Number =
Sample Number =
% At many times, you can return to the$ previous topic by pressing the PgUp
key.& Try it now to go back to the previous
screen.% The PgUp, or PageUp, key is a cursor! key. The cursor keys are on the$ number keys on many keyboards. You% may need to press the NumLock key to activate the cursor keys if the# numbers and cursor keys are on the
same keys.( That was the 9 instead of the PgUp key.! Press the NumLock and try again.
PgUp-
% When you are asked to enter a number$ you can type a number and press the" enter key. If you don't know how big of a number to choose, just$ press the enter key and the program" will choose an appropriate value. The program will also choose an! appropriate value if you enter a& number that is outside the range that
it can handle.$ Special keys, such as the space bar! and the PgUp key, are not active! when the computer is waiting for
you to enter a number.
Try entering a number now:
The program chose U
RSP1
Hardware requirements: StaTutor
runs on an IBMPC with " DOS 2.0 or higher and requires a
graphics adapter board.% CGA, EGA, VGA, and Hercules adapters" are currently supported. ATT and% IBM 3270 may also work (not tested).$ With a CGA adapter, the display can
be changed between 80 and 40
text column width.
Forty column resolution allows
use with a projection system
in a classroom environment.
The program may also need as
much as 320K of free memory.U
StaTutor
helps to demonstrate' several basic statistical concepts but% does not teach computational skills.
is designed to complement# a course in elementary statistics.
has been developed for$ use by an instructor in a classroom$ environment or by students enrolled
in a statistics course." The student should consult texts,
teachers, or other resources to learn details of the results
shown by these demonstrations
as well as to learn how and
why they work.U
Registration and Distribution:> This program may be freely distributed, but may not be sold.D I welcome constructive comments and suggestions about the program.
Robert A. Wolfe
Department of Biostatistics
University of Michigan
Ann Arbor, MI 48109U
Statistical Details: All of these demonstrations are based on classical statistical # methods and they use a frequentist
sampling orientation.& Generation of Univariate Populations:% Univariate populations are generated
by an inverse probability! transformation on equally spaced$ probabilities on the unit interval.
Univariate Sampling Methods:% Sampling experiments are carried out$ by sampling from finite populations
with replacement.
Generation of Bivariate Data:
Bivariate data are generated# conditionally on the X-values with" pseudo-random Gaussian errors for
the Y-variable (polar method).
Bivariate Sampling Methods: by generating new pseudo-random
observations for each sample." In order to increase the speed of$ display, the "population" used with& prediction bands does not change from
sample to sample.$ Confidence Intervals and Inference:$ Univariate confidence intervals are$ based on the central limit theorem.
These confidence intervals use
percentiles from the Gaussian
distribution.! Regression inference is based on" the t-distribution with 2 decimal
accuracy." Because of the discrete nature of the binomial distribution, only! certain sample sizes are allowed! with the binomial population so that the histogram of the means
will be more interpretable.U
Acknowledgements:
The development of StaTutor# was supported, in part, by a grant
from The Center for Research
in Learning and Teaching
of the University of Michigan.
StaTutor is written in
TURBO Pascal.U
Return to Main Menu"Sound, Screen resolution and color
Sound, Screen colors
Statistical Details
Teaching Objectives and Use
Hardware Requirements
Acknowledgements
Keyboard Use
Copyright and Distribution
Further Information about the
Statistical DemonstrationsU
$)$3$=$G$Q$[$e$o$y$
%%%/%9%C%M%W%a%k%u%
&!&+&5&?&I&S&]&i&N'S']'g'q'{'
)#)9)b)x)
*.*<*A*O*\*f*p*z*
PC3270
Monochrome Hercules
ATT400
64K EGA
Monochrome EGA
by U
Graphics Error:* This program requires a graphics adapter./ CGA, EGA, VGA, and MonoHercules are supported.U
To change the colors press
one of the following keys:( "M" for Monochrome (shown on this line)
"C" for the default colors
"B" to change the BackGround
"T" to change the Text color# "H" to change the Highlight color.! Press "R" for 40 character lines( with 4 color graphics screens( and multi-color text screens.! Press "R" for 80 character lines
with 2 colors.
"S" to turn sound
off.
"D" to make
dimensional histograms.
Press the
Space
bar, the long key# in the middle at the bottom of the
keyboard, or the
enter key, to
return to the prior menu.
Press
Space
-Done with choices
M,C,B,T, or H
-Color choices
-Change Resolution
Testing graphics resolution
and text font size.
Graphics Driver is
with resolution
Press any key to continue
# 2 : D P \ f p z
Press
Space
-Done
-Return
-Next
-Help
PgUp
-Back
-Pause
-Quit
-Auto demoU
Sampling from a Population
!Confidence Intervals for the MeanU
confidence interval
is a statistical' tool that is used to make an inference' about the value of the population mean$ based on sample data. Although the$ sample data does not tell the exact' value of the population mean, the data# can be used to compute an interval$ that is likely to contain the value
of the population mean.' A confidence interval for a parameter,& such as the population mean, consists$ of two numbers: the upper and lower
confidence
limits
for the parameter.% The probability that the two numbers% capture the population value between
them is called the confidence
coefficient
.# This program will compute and show& confidence intervals for the mean for' a series of samples from a population.U
& First, some of the technical details:
Because of the
Central Limit Theorem
,$ the limits equal to the sample mean
plus or minus
times sd /
n can be expected to capture the
population mean for about % of the$ random samples of size n that could# be drawn from a population, if the
sample size, n, is large.
+ or -
* sd /
is an approximate
% confidence" interval for the population mean.' (sd is the sample standard deviation.)" Review the method for computing a" confidence interval in a textbook if you don't know how to do it.U
This program will compute an
approximate
% confidence interval% for the population mean based on the& data from each of a series of samples
from the
population.% How many observations would you like
to have in each sample?
Please enter an integer: ' These approximate confidence intervals$ are based on the approximate Normal& (Gaussian) distribution of the sample% mean. An approximation based on the$ t-distribution may be more accurate! since the standard deviation is
estimated from sample data.U
+ or -
* sd /
Approximate
% confidence interval.U
" The confidence coefficient is the% fraction of potential random samples" from the population for which the! confidence interval will enclose
the parameter value.
You can choose the approximate
confidence coefficient now.
observations
per sample.
Approximate
Confidence Interval.
A sample is shown in
the top histogram.
The sample mean and
confidence interval
are marked by
vertical and
horizontal lines.
The confidence
interval changes
from sample to
sample but the
parameter does not.
The standard
deviation for this
sample is 0.
interval is not
very interpretable.
population.
(In practice, the value of the population%mean is unknown and must be estimated
from sample data.%In this demonstration, the population&distribution is known. A histogram is$shown below, along with the value of'the population mean. The vertical line%at the top of the histogram marks the
value of the mean.
Values near 0 are
most common.
Any value is
possible but only
the range between
+ and - 7 is shown.
There are two
groups in this
population
labeled 0 and 1.
The mean is the
fraction in
group 1.
The values range
between 0 and 9.
All values occur
equally frequently.
Any positive value
is possible.
Only the range
0 to 7 is shown.
Small values are
more common.
+ and - 4 is shown.U
PopulationU
A random sample of
observations is shown.
The positions of
the population and
sample means are
shown and a line
extends between the
lower and upper
confidence limits.U
'In practice, only one sample is 'available and we do not know the value 'of the population mean or whether or 'not it was captured by the confidence 'interval.
'In this demonstration, samples will be 'drawn from the population repeatedly 'in order to show how the confidence 'interval varies from sample to sample. U
Sample &The confidence interval is centered at&the sample mean. The line marking the
interval does
capture the population mean for this sample.
"The confidence limits are designed
to capture the population mean
between them for about
% of the
samples of size
that could be
drawn from the population.U
# You can change any of the 3 things
listed below and continue the% sampling experiment or you can quit.
Sample Size (
Confidence Coefficient (
Population (
Enter a new sample size:U
Population
Sample
Parameter captured. missed.
Captured by
of samples so far.U
1. The confidence interval is" computed using sample data. An& approximate confidence interval for' the mean can be computed without any! knowledge about the population" distribution if the sample size
is large.#2. The confidence interval does not& always capture the parameter value.# In practice, we do not know if a! particular confidence interval% captures the population parameter.'3. The confidence interval changes from
sample to sample.$4. The confidence coefficient is the' fraction of all the possible samples that would yield a confidence
interval that captured the
parameter.U
&5. The confidence interval is wider if" a. The sample size is smaller.$ b. The confidence coefficient is
higher.! c. The sample values are more
variable. 6. An approximate 1-p confidence& confidence interval for the mean is
often computed as:
+ or - Z(1-p/2) * sd /
n& where Z(1-p/2) is a percentile from
the normal distribution.'7. A percentile from the t-distribution
is often used instead of Z.U
Available Control Keys:
utomatic Demonstration.
Return to the
B eginning.
uit Sampling.
uit Program.
elp Screen.
Select
ew Population
and Sample Size.
ause/Continuous Sampling.
Pgup-
Return to Previous Topic.
Space-
Continue to Next
Sample.
Topic.# Different keys may be available at
other points in the program.
The control keys are
available when the program is" waiting for you to enter a number! or make a selection from a menu.U
I< u
QuitU
& ? D N S ] g q {
!"!j"
#4#<#P#x#
$_%l%r%{%
& &*&4&>&H&X&m&r&|&
()(8(T(d(o(y(
(")d)n)
)S*l*
-+-3-8-L-
/*///5/?/I/S/\/i/o/
0"0+0F0T0a0f0l0v0
1,141;1
4!4+454?4I4S4]4g4q4{4
7%7/797C7M7W7a7v9
: :4:@:J:T:^:h:r:
;&;0;D;P;Z;d;n;
;8<I<T<e<l<q<
?'?E?a?r?z?
@ @)@_@dA
C-C2C:C@CECMCSCaCfCnCtC
Press
Space
-Done
-Return
-Next
-Help
PgUp
-Back
-Pause
-Quit
-Scale
N -New demo
-Auto demoU
Sampling from a Population
Variability of the Sample Mean#Central Limit Theorem ApproximationU
$ The tools of statistics can help us
to draw conclusions about the
characteristics of all of the
subjects in a
population based on$ the characteristics of the subjects
in a
sample
from the population.
Although the characteristic being studied could be observed# for all members of the population,$ it need only be observed for a part% of the population called the sample.! A numerical summary based on the
whole
is often called a
parameter
. A numerical summary
based on the is often
called a statistic
The value of a
used to guess, or
estimate
, the
value of a parameterU
The average, or mean, is a numerical summary of the typical
value of a characteristic.
$The sample average is the sum of the
values observed divided by the
number of observations in the
sample.
Let N denote the number of
observations in the sample.
Let X
denote the value for the
i'th observation.#The sample average is computed as:
N
i=1$ This number is often referred to as( "X Bar" because of the line over the X.U
$ Data from the sample are often used& to make guesses about the population.! For a random sample with a large% number of observations, it is likely" that the histograms of the sample$ and population data are similar and" that numerical summaries based on
sample data are close to the
corresponding summaries of the
population values.$ Thus, it is plausible to guess that the value of the sample average& is close to the average of all of the# values in the population, if it is based on a large random sample." The process of making conclusions# about a population based on sample
data is called statistical inference
' This program will demonstrate that the value of the sample mean varies% depending upon which sample is drawn
from the population.% How many observations would you like
to have in each sample?
Please enter an integer:
Binomial& The sample size has been set equal to
in order to make the displays
more interpretable.U
Sampling from the
population.%In practice, details about the values'in the population are often unknown and#must be estimated from sample data.%In this demonstration, the population&distribution is known. A histogram is$shown below, along with the value of&the population mean. The arrow at the'top of the histogram marks the value of
the population mean.
Any value is
population:
Values near 0 are
most common.
possible but only
the range between
+ and - 7 is shown.
There are two groups
in this population
labeled 0 and 1.
The mean is the
fraction in group 1.
The values range
between 0 and 9.
All values occur
equally frequently.
Any positive value
is possible.
Only the range
0 to 7 is shown.
Small values are
more common.
+ and - 4 is shown.U
PopulationU
Review of Concepts 1. The individual values in the" population have a distribution
which is unknown.' 2. The observations in a random sample" can be used to make inferences# (guesses) about the population.% 3. The sample histogram approximates% the population distribution. % 4. The sample mean estimates the
population mean.U
% 5. The value of the sample mean can
vary from sample to sample.% 6. There are a very large number of % potential random samples that
could be drawn from most
populations.% 7. The distribution of the potential# values of the sample means from! all the potential samples of size
is approximately
may not be close to
Gaussian (Normal) since the" number of observations in each
sample is
large
small
! samples have been drawn, so far.
There were
observations per sample.'The distribution of the sample averages&of all random samples is nearly Normal'if the number of observations is large.
Go back and look at
just a few more samples
so that this makes sense.&The Normal approximation can be better
seen after you draw more than
samples.
The density function
of the Normal
distribution is shown
with the histogram
of the sample means
from samples.!Tables of the Normal distribution#can be used to approximate how big !the difference between the sample
mean and the population mean
is likely to be, even
when we don't know
the value of the
population mean.
Press
-change scale.
-more samples.
Samples
MeansU
A histogram for a
random sample of
observations is shown
to the right.
The value of the
sample mean is shown
and its position is
marked by an arrow.$In practice, only one sample is used#for inference about a population. %Based on this sample, one might infer
that about
% of the population
is in group 1.!that the population mean is close
.&In this demonstration, samples will be%drawn from the population repeatedly
in order to show how much the %estimates vary from sample to sample.U
Sample U
There have been
different samples of
size
so far.
The sample means have been
The first
sample means were
A histogram of these
sample
means is shown.!The x-scale of the histogram will"change when you press the "s" key.
Samples
MeansU
Population.
A random sample of
observations is
shown at the right.
Random samples of
observations
are being drawn.
The histogram of all
the means of the
samples seen so far
below it.
shown.
is getting closer to
the distribution of
the means from all
possible samples.U
Population
Samples
Means
Speeding up!
Sample
Mean=
Record the mean
U
Available Control Keys:
utomatic Demonstration.
Return to the
B eginning.
uit Sampling.
uit Program.
elp Screen.
ew Population and Sample Size.
ause/Continuous Sampling.
cale change.
Pgup-
Return to Previous Topic.
Space-
Continue to Next
Sample.
Topic.# Different keys may be available at
other points in the program.
The control keys can
be used when# the program is waiting for you to & enter a number or select from a menu.U
P< u
QuitU
& 0 : D N X b l v
% / 9 C M W '"4">"H"R"\"f"p"z"
#-#9#C#T#n#|#
$)$I$p$x$
%1%9%P%U%o%
* *%*/*4*>*H*R*\*f*
+-+>+C+M+R+b+l+v+
,^-o-y-
-M.e.
///I/Y1f1r1w1
2(222<2F2P2Z2d2z2
3&303:3D3]3
4"4'4,4{5
6-6?6Q6V6`6e6
7'777H7M7W7\7f7p7
8@8Z8t8
:%:*:/:8:E:S:]:u:
:&;0;:;D;\;h;z;
@=ARAWA\AoA
AlB~B
C!CeCqCyC
C3DADRD\DeD
G%G9GCGMGWGaGkGuG
H(H2HOHlHvH
I"I,I6I@IJITI
M"M)M2M\N
O8PAP
Press@
Space
-Next
-Help
PgUp
-Back
-Pause
-Quit
L-No
L-for
lines
-AutoU
Interpreting the
Correlation Coefficient,
the R-squared Statistic, and other
Measures of Linear Association
from Simple Linear Regression.U
" This program demonstrates several$ ways to measure linear association.
correlation coefficient
denoted here as
, and
R-squared" will be computed for various X-Y
scatter plots.! You can learn that the numerical
values of R and R-squared are" related to the visual information
in the X-Y scatterplot.
can range between
-1 and +1
while
can range between
0 and 1
is interpreted as the! fraction of the variability of Y that is explained by the linear
relationship with X.U
The
standard deviation
of Y about the' regression line summarizes the typical& vertical distance between an observed$ value of Y and the regression line.& It measures how close the data points
are to the regression line.
It is denoted here by
sd(Y|X)
.' The typical vertical distance is often$ computed as the square root of the & "average" of the squared distances in$ the observed sample. The "average"& is usually computed by dividing a sum% by N-2 to yield an unbiased estimate! of a parameter and to take into account the degrees of freedom.% In this demonstration, the "average" squared distance is computed by# dividing by N. This approximation$ will help to show how R-squared is
related to sd(Y|X) and sd(Y).U
' 1. R ranges from -1 to +1. R is close" to 0 if there is little linear association between Y and X.$ 2. R has the same algebraic sign as% the slope of the regression line.% 3. R-squared ranges between 0 and 1.$ R-squared is the fraction of the& variability of Y that is explained$ by a linear relationship with X.U
! 4. The sample regression line is$ typically closer to the observed" values of Y than is the sample
average of Y.# 5. The typical distance between an
observed value of Y and the! regression line estimates the$ standard deviation of Y given X." It is denoted here by sd(Y|X).U
Choose the
sample size
now.
Enter the sample size >U
Random samples of size
will# be drawn from various populations.& The least squares regression line and" the data will be plotted for each
sample and the value of the correlation coefficient will be
computed.
, sd(X), and sd(Y)% denote the sample means and standard% deviations of X and Y, respectively,$ and the X-Y data values are denoted
by (X
) for i=1,...,n,% then the correlation coefficient can
be computed as
R = -----------------------
sd(X) * sd(Y) * (n - 1)U
V&Y^_
The typical
difference
between Y and
the regression
line is shown
by the vertical
bar nearest
the scatterplot:
sd[Y|X]=
The correlation
coefficient is
between -1 and 1.
R is positive
R is negative
since the fitted
line is
increasing.
decreasing. R-squared
measures the
strength of
the linear
relationship.
R-squared is
between 0 and 1.
The lines next
to the plot show
the standard
deviation of Y
about YBar and about the
regression line.
The standard
deviations are
computed from
an average as
shown by the
moving lines.
Press the L key
to get rid of
the moving lines.
to cause the
moving lines
to be shown.
Caution:
The measures of linear
association
shown here
are not
adequate for
summarizing
non-linear
relationships.
Technical detail:
deviations shown
here are not
adjusted for
degrees of
freedom.
Sums of squares
are divided by
the sample size.
The ratio of sd's
reported is the
square root of
1 - R squared.
If an individual
value of Y were
predicted using
line, the typical
error would
only be about
as big as it
would be if Y
were predicted
using YBar
for this sample.
Population
Sample Size
R-Squared=
Slope
sd[Y|X]
sd[Y]
YBar is shown by
the vertical bar
to the left:
sd[Y]=
Ratio of sd's
Control Keys:
A-
Automatic demonstration.
B-
Return to the
B eginning.
Q-
uit Program.
H-
elp Screen.
N-
Select
ew Sample Size.
P-
ause/Continue Demonstration.
L-
Show/Do not show moving
ines.
Pgup-
Return to Previous Topic. Space-
Continue to Next
Sample.
Topic.U
X b l v
" G L _ x
!&!+!0!9!
#"#M#^#f#
$$$4$C$O$[$c$h$~$
%!%+%5%O%[%u%
&"&,&6&K&U&_&i&s&}&
'!'+'5'?'I'^'h'r'|'
('(1(;(E(U(v({(
)*);)@)I)S)
+7+<+A+S+]+w+
,*,4,
-(-G-[-c-h-z-
.0.5.^.c.h.p.z.
/)/3/=/M/\/z/
/_0h0u0
171<1A1I1N1k1
3.393C3M3W3a3k3u3
4)434F4R4\4f4p4
565I5T5o5t5
7#7W7e7
8I9h9q9":>:G:M:Y:^:f:l:z:
Please wait.U
Press
Space-
Return
You Choose
Next @
PgUp-
onfidence.
iction.
ause
AutoU
Sampling from a Population
with Bivariate Data.#Variability of the Regression Line.!t-statistics and t-distributions.
Hypothesis Testing.
Confidence Intervals.
Prediction Intervals.U
# This program will demonstrate the
variability of the
least squares
estimates for the
regression line
Different
samples
from a
population" yield different sample regression
lines and sample
statistics
The
t-statistic
based on the! difference between the estimated
slope and the
slope will
be computed for each sample. $ The histogram of these t-statistics
will
be shown on the screen.# In practice, the true slope is not# known, and a hypothesized value is! used to compute the t-statistic.
Confidence
bands and
prediction# bands can be shown for each sample
instead of the t-statistics.
This program will demonstrate
that confidence intervals for
the population regression line
can usually enclose the
population line even though
they are based entirely on
sample data.
bands will be computed for each of many samples drawn
from a bivariate population.# This program will demonstrate that
prediction bands for X-Y! observations from the population
can usually enclose a large" fraction of the population values
even though they are based
entirely on sample data.
Prediction
for each of many samples drawnU
Designated Parameters:
Intercept,
Slope,
Standard Deviation, sd(Y|X) = U
Parameters:
Intercept =
Slope =
sd(Y|X) =
PopulationU
% The first step is to choose the type$ of population that the samples will
be drawn from.' The population is chosen by specifying
the values of certain
parameters
.' The values to be selected include the
slope
, intercept
and
standard
deviation.' You may choose the values by yourself,
or let them be chosen for you.% To let the values be chosen for you,
press
-for automatic demo.
Press the
Space-
bar to choose
them yourself.U
Please choose values for the
parameters now.
Slope,
Intercept,
Standard Deviation, sd(Y|X): U
Now choose the
sample size N
Enter the sample size >U
& With repeated sampling, the values of
t have a
t-distribution
with
degrees of freedom if
In order for the
confidence
prediction
intervals
to work well, the following
conditions should be met:
Y(i) =
*X(i) + E(i) , where :
1. E(i) are
independent
E(i) have
mean 0
The variance
of E(i) does not
depend on X.
E(i) have a
normal
distribution.! These are the conditions for the normal errors regression model.& Even when condition 4 is relaxed, the& t-distribution sometimes approximates the sampling distribution of t.
confidence intervals perform" well if the sample size is large.& Prediction intervals do not work well& unless all of the conditions are met.U
#Review Regression Prediction Bands:
%1. The prediction bands attempt to % capture a large fraction of the X-Y points in the population.$2. The prediction bands are centered# at the sample line, which varies
from sample to sample. %3. 95% prediction bands bracket about$ 95% of the population X-Y points.&4. The width of the prediction bands & varies from sample to sample mostly$ because the standard deviation of
Y given X is estimated.&5. The actual fraction of the X-Y & points covered depends strongly on ! the distribution of Y in the & population and on how closely the population line is estimated.U
%Review confidence band for regression
"1. The confidence band attempts to$ enclose the population regression% line a large fraction of the time.&2. The confidence band is centered at % the sample line, which varies from
sample to sample. &3. 95% confidence bands should capture$ each point on the population line
about 95% of the time.'4. Other confidence procedures based on$ the F distribution can be used to! give a simultaneous confidence
region for the whole line.#5. Confidence intervals give upper % bounds on the error of an estimate% that are usually correct (e.g. 95%
of the time.).U
&Review t-distributions and statistics.
$1. The t-statistic based on the true' parameter value has a t-distribution
under certain conditions.#2. The t-statistic based on a value
other than the parameter is" likely to differ from 0 by more" than would be expected from the
t-distribution.'3. Note that, in practice, t-statistics' are based on a hypothesized value of$ the slope since the true value is
not known.U
Review Estimation and Sampling
'1. Statistical methods based on sample
data can be used to make" inferences about a population.%2. The least squares regression line% is an estimate for the population
regression line.%3. We usually do not know the actual# error of an estimate because we% do not know the population value.#4. The t-statistic values computed" using the true parameter value have a t-distribution, under
certain conditions.$5. The t-distribution can often be $ used to help make probabilistic
inferences about population
parameter values.U
$ Next, each of the t-statistics will% be compared to the t-distribution to
determine whether the
sample
slope
was
significantly different
from
the
population
slope. The percent of the samples that yielded a significant result at" the 5% significance level will be
reported. Except for sampling variability, the percent should
be close to 5%.! In addition, other t-statistics,
for testing the
null hypothesis
that the slope is equal to 0,
were computed for each sample and the fraction of the samples
that had slopes significantly# different from 0 will be reported.U
$ Independent random samples of size ! will be drawn from the values in
the population.
The slope,
, and intercept,
, for " the population are estimated from
each sample. With "^" over estimated values,# the sample regression equation for
the Mean(Y|X) can be written :
=
X# For each sample, the least squares
estimate for the slope and the' estimate for the standard error of the
slope will be used to compute:
t = (
)/se(
Confidence bands
should capture
the population
line.
Prediction bands
capture X-Y
points.
are wider than
Confidence Bands.U
Prediction Bands Discussion:$ Prediction bands attempt to capture# a large fraction of the population
X-Y points between them.& Prediction bands are designed so that# there is a high probability that a new random observation from the! population will be between them." Prediction bands must account for# the variability in the sample data" as well as the variability in the
new observation.$ All of the normal errors regression$ model assumptions are important for" the accuracy of prediction bands.U
$ Prediction Bands for the X-Y points& in the population are based on sample$ data and are centered at the sample
regression line.% The + or - deviation at a particular
value of X is the product of:# 1. t(n-2,.975) (95% Probability)$ 2. Square root of MSE (mean square
error)
3. Square root of:
1 (X-
1 + - + -------
n
Prediction Notes:
Designed for
95% Probability.
Bands centered
on Sample Line.
Width varies with
sd(Y|X) estimate
The population
points are not
usually known. U
Notes:
95% Confidence.
Bands Centered
on Sample Line.
The Population
line is not
usually known.
Bands are wider
at the extremes.U
Confidence Bands Discussion:
Confidence bands are computed
using only sample data.! The width and shape of the bands
are determined from the data
so that they are likely to
capture the population
regression line between them.
For a specific X value, the! confidence bands are designed to! capture the population mean with" a certain probability, called the
confidence coefficient
. If the conditions of the normal" errors regression model hold true
then the confidence interval
bands will work as designed.U
$ Confidence Bands for the population$ regression line are based on sample$ data and are centered at the sample
regression line.% The + or - deviation at a particular
value of X is the product of:" 1. t(n-2,.975) (95% Confidence)$ 2. Square root of MSE (mean square
error)
3. Square root of:
1 (X-
- + ------- n
Samples will be
drawn from the
X-Y data points
in the population
to the right.
The population
regression line
is shown and the
values of the
slope and
intercept
parameters are
shown above.
The sampling
experiment is
next.
You can change
the switch
es for
Confidence,
Prediction, or
Pause either
now or during
the experiment
by pressing the C, D, or
P key
Parameter:
Int.=
Slope=
Estimate :
s.e. :
t-Stat :
n=
R-squared=
sd(Y|X)=
Sample
M 1 t-value t-values
% are between.U
A histogram of the
t-statistics based on
the estimated and
true slope are shown
to the left. They
represent just a
fraction of all the
t-statistics that
could result from
samples of size
from the population.
The t-distribution
with
degrees of
freedom is also
shown.# t-statistics for the following two% hypotheses were computed for each of
the
samples. The results are! reported for 5% 2-sided t-tests.
Hypothesis H:Slope =
| H:Slope = 0
Rejected
Available Control Keys:
A-
utomatic Demonstration.
B-
Return to the
B eginning.
C-
onfidence interval switch.
D-
iction interval switch.
Q-
uit Sampling.
uit Program.
H-
elp Screen.
N-
Select
ew Parameters
and Sample Size.
P-
ause/Continuous Sampling.
Pgup-
Return to Previous Topic.
Space-
Continue to Next
Sample.
Topic.# Different keys may be available at
other points in the program.
The control keys are
available when the program is# waiting for you to enter a number.U